An Evaluation of the Contextual Spelling Checker

نویسنده

  • Graeme Hirst
چکیده

Microsoft Office Word 2007 includes a “contextual spelling checker” that is intended to find misspellings that nonetheless form correctly spelled words. In an evaluation on 1400 examples, it is found to have high precision but low recall — that is, it fails to find most errors, but when it does flag a possible error, it is almost always correct. However, its performance in terms ofF is inferior to that of the trigrams-basedmethod ofMays, Damerau, and Mercer (1991). 1 Real-word spelling correction Most spelling checkers attempt to detect and correct only misspellings that result in a presumed nonword — a word that is not listed in the system’s dictionary — and they therefore cannot deal with an error that just happens to form a real word in the dictionary, albeit not the word that the user intended. There has been much research in recent years on methods for detecting and correcting such real-word errors or malapropisms (we use the two terms interchangeably); for a review, see Hirst and Budanitsky (2005) and Wilcox-O’Hearn et al (2008). The recently released Microsoft Office Word 2007 includes a “contextual spelling corrector” that attempts to detect and correct real-word errors (Microsoft 2006). A word that the system believes to be in error is flagged with a wavy blue underline, in contrast to Word’s regular red underline for non-word errors, and suggested corrections are available in a pop-up menu or in the ‘Spelling and Grammar’ window. This system operates not only on content words but also closed-class words (e.g., too and to). It can detect cases where a word has been wrongly split into two (e.g., through out for throughout), and missing or spurious apostrophes (e.g., corporations for corporation’s). It is not limited to a predefined set of frequently confounded words. Here we report an evaluation of this system that was carried out in order to compare it with the word-trigram method of Mays et al (1991) and the lexical cohesion method of Hirst and Budanitsky (2005). (A detailed evaluation of the Mays et al method and a comparison with the lexical cohesion method is given by Wilcox-O’Hearn et al (2008).) ∗This research was supported financially by the Natural Sciences and Engineering Research Council of Canada. I am grateful to Amber Wilcox-O’Hearn for comments and assistance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Contextual Spelling Correction Using Latent Semantic Analysis

Contextual spelling errors are defined as the use of an incorrect, though valid, word in a particular sentence or context. Traditional spelling checkers flag misspelled words, but they do not typically a t tempt to identify words that are used incorrectly in a sentence. We explore the use of Latent Semantic Analysis for correcting these incorrectly used words and the results are compared to ear...

متن کامل

A Light Weight Stemmer for Bengali and Its Use in Spelling Checker

Stemming is an operation that splits a word into the constituent root part and affix without doing complete morphological analysis. It is used to improve the performance of spelling checkers and information retrieval applications, where morphological analysi would be too computationally expensive. For spelling checkers specifically, using stemming may drastically reduce the dictionary size, oft...

متن کامل

Spelling Correction Based on User Search Contextual Analysis and Domain Knowledge

We propose a spelling correction algorithm that combines trusted domain knowledge and query log information for query spelling correction. This algorithm uses query reformulations in the query log and bigram language models built from queries for efficiently and effectively generating correction suggestions and ranking them to find valid corrections. Experimental results show that for both simp...

متن کامل

Corpus-based Evaluation of a French Spelling and Grammar Checker

This article describes an evaluation method for spelling and grammar checkers and gives the results of its application to two French checkers. The evaluation process follows closely the ISO/IEC and EAGLES guidelines, and defines precisely the evaluation metrics, so that they can be easily reproduced. The choice of professional translators as user profile entails the use of a corpus of spelling ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008